CASE STUDY - 1 :: Healthcare Provider Fraudulent Detection

drawing



This notebook contains the extensive data analysis that been carried out on the publicly available dataset at Kaggle with the intent:

Kindly checkout below link for gaining BUSINESS related insights about this problem:

Kindly checkout below link for TECHNICAL description about this problem:

Notebook Contents

CASE STUDY - 1 :: Healthcare Provider Fraudulent Detection

Notebook Contents

Downloading_Train_Data_Files

Importing_Libraries

Importing_Dataset

Exploring_BENEFICIARY_Data

Q1. How many unique beneficiaries we have in our dataset?

Q2. How many records we have at the GENDER level?

Q3. Lets calculate the AGE of every BENEFICIARY?

Q4. Lets see the ratio of GENDER across various HUMAN RACE?

Q5. Lets see the number of beneficiaries with Chronic Renal Disease.

Q6. Lets see the number of beneficiaries on the basis of State Codes.

Q7. Lets see the number of beneficiaries on the basis of Country Codes.

Q8.1 Lets see the number of beneficiaries on the basis of 'NoOfMonths_PartACov'.

Q8.2 Lets see the number of beneficiaries on the basis of 'NoOfMonths_PartBCov'.

Q9 Lets see the number of beneficiaries on the basis of 'ChronicCond_Alzheimer'. And, the Annual IP & OP expenditures for such patients.

Q10. Lets see the number of beneficiaries on the basis of 'ChronicCond_Heartfailure'. And, the Annual IP & OP expenditures for such patients.

Q11. Lets see the number of beneficiaries on the basis of 'ChronicCond_KidneyDisease'. And, the Annual IP & OP expenditures for such patients.

Q12. Lets see the number of beneficiaries on the basis of 'ChronicCond_Cancer'. And, the Annual IP & OP expenditures for such patients.

Q13. Lets see the number of beneficiaries on the basis of 'ChronicCond_ObstrPulmonary'. And, the Annual IP & OP expenditures for such patients.

Q14. Lets see the number of beneficiaries on the basis of 'ChronicCond_Depression'. And, the Annual IP & OP expenditures for such patients.

Q15. Lets see the number of beneficiaries on the basis of 'ChronicCond_Diabetes'. And, the Annual IP & OP expenditures for such patients.

Q16. Lets see the number of beneficiaries on the basis of 'ChronicCond_IschemicHeart'. And, the Annual IP & OP expenditures for such patients.

Q17. Lets see the number of beneficiaries on the basis of 'ChronicCond_Osteoporasis'. And, the Annual IP & OP expenditures for such patients.

Q18. Lets see the number of beneficiaries on the basis of 'ChronicCond_rheumatoidarthritis'. And, the Annual IP & OP expenditures for such patients.

Q19. Lets see the number of beneficiaries on the basis of 'ChronicCond_stroke'. And, the Annual IP & OP expenditures for such patients.

Q20. Lets see the number of beneficiaries on the basis of 'RenalDiseaseIndicator'. And, the Annual IP & OP expenditures for such patients.

Q21. Lets check the percentiles of the pre-disease indicators for the Annual IP expenditures for such patients.

Q22. Lets just visualize the spread of pre-disease indicators for the Annual IP and OP expenditures across males and females.

Q23. Lets visualize the spread of Annual IP and OP expenditures through out the AGE and its assciated features for males and females.

SUMMARY

Downloading_Train_Data_Files

Importing_Libraries

Importing_Dataset

Exploring_BENEFICIARY_Data

Q1. How many unique beneficiaries we have in our dataset?

Q2. How many records we have at the GENDER level?

OBERVATION

Q3. Lets calculate the AGE of every BENEFICIARY?

OBERVATION

REASONING

OBERVATION

REASONING

Q4. Lets see the ratio of GENDER across various HUMAN RACE?

This is based on a racial classification made by Carleton S. Coon in 1962. Refer here.

Q5. Lets see the number of beneficiaries with Chronic Renal Disease.

For more info refer here --> link-1 link-2

I found this link useful in order to understand the difference b/w both of these. It looks they have RenalDisease indicator to represent whether the beneficiary has or had Kidney Failure. And, ChronicCond_KidneyDisease represents the long term Kidney Disease may be like not functioning to the fullest.

Q6. Lets see the number of beneficiaries on the basis of State Codes.

Q7. Lets see the number of beneficiaries on the basis of Country Codes.

Q8.1 Lets see the number of beneficiaries on the basis of 'NoOfMonths_PartACov'.

Q8.2 Lets see the number of beneficiaries on the basis of 'NoOfMonths_PartBCov'.

Q9 Lets see the number of beneficiaries on the basis of 'ChronicCond_Alzheimer'. And, the Annual IP & OP expenditures for such patients.

Q10. Lets see the number of beneficiaries on the basis of 'ChronicCond_Heartfailure'. And, the Annual IP & OP expenditures for such patients.

Q11. Lets see the number of beneficiaries on the basis of 'ChronicCond_KidneyDisease'. And, the Annual IP & OP expenditures for such patients.

Q12. Lets see the number of beneficiaries on the basis of 'ChronicCond_Cancer'. And, the Annual IP & OP expenditures for such patients.

Q13. Lets see the number of beneficiaries on the basis of 'ChronicCond_ObstrPulmonary'. And, the Annual IP & OP expenditures for such patients.

Q14. Lets see the number of beneficiaries on the basis of 'ChronicCond_Depression'. And, the Annual IP & OP expenditures for such patients.

Q15. Lets see the number of beneficiaries on the basis of 'ChronicCond_Diabetes'. And, the Annual IP & OP expenditures for such patients.

Q16. Lets see the number of beneficiaries on the basis of 'ChronicCond_IschemicHeart'. And, the Annual IP & OP expenditures for such patients.

Q17. Lets see the number of beneficiaries on the basis of 'ChronicCond_Osteoporasis'. And, the Annual IP & OP expenditures for such patients.

Q18. Lets see the number of beneficiaries on the basis of 'ChronicCond_rheumatoidarthritis'. And, the Annual IP & OP expenditures for such patients.

Q19. Lets see the number of beneficiaries on the basis of 'ChronicCond_stroke'. And, the Annual IP & OP expenditures for such patients.

Q20. Lets see the number of beneficiaries on the basis of 'RenalDiseaseIndicator'. And, the Annual IP & OP expenditures for such patients.

Q21. Lets check the percentiles of the pre-disease indicators for the Annual IP expenditures for such patients.

Q22. Lets just visualize the spread of pre-disease indicators for the Annual IP and OP expenditures across males and females.

Q23. Lets visualize the spread of Annual IP and OP expenditures through out the AGE and its assciated features for males and females.

SUMMARY

  1. For the below mentioned features, based on the above initial analysis it looks like these might not be able to provide much information or differentiation but still I would like to check them after adding CLAIMS data.
    • DOB YEAR
    • DOB MONTH
    • AGE GROUPS
    • LIFE STATUS
    • HUMAN RACE
    • STATE
  1. For the below mentioned features majority of the values are same which most probably won't be of any use thus removing these from BENE dataset.
    • NoOfMonths_PartACov
    • NoOfMonths_PartBCov
  1. The Pre-disease indicators looks like important features based on the initial analysis thus it would interesting to see how much they are useful after adding CLAIMS dataset.
  1. Date of Death is also removed from the dataset, as we have already calculated bene age, life status and others out of it.